Musical Note Estimation for F0 Trajectories of Singing Voices Based on a Bayesian Semi-Beat-Synchronous HMM
نویسندگان
چکیده
This paper presents a statistical method that estimates a sequence of discrete musical notes from a temporal trajectory of vocal F0s. Since considerable effort has been devoted to estimate the frame-level F0s of singing voices from music audio signals, we tackle musical note estimation for those F0s to obtain a symbolic musical score. A naı̈ve approach to musical note estimation is to quantize the vocal F0s at a semitone level in every time unit (e.g., half beat). This approach, however, fails when the vocal F0s are significantly deviated from those specified by a musical score. The onsets of musical notes are often delayed or advanced from beat times and the vocal F0s fluctuate according to singing expressions. To deal with these deviations, we propose a Bayesian hidden Markov model that allows musical notes to change in semi-synchronization with beat times. Both the semitone-level F0s and onset deviations of musical notes are regarded as latent variables and the frequency deviations are modeled by an emission distribution. The musical notes and their onset and frequency deviations are jointly estimated by using Gibbs sampling. Experimental results showed that the proposed method improved the accuracy of musical note estimation against baseline methods.
منابع مشابه
Parameter estimation method of F0 control model for singing voices
In this paper, we propose a novel representation of F0 contours that provides a computationally efficient algorithm for automatically estimating the parameters of a F0 control model for singing voices. Although the best known F0 control model, based on a second-order system with a piece-wise constant function as its input, can generate F0 contours of natural singing voices, this model has no me...
متن کاملScale- and Rhythm-Aware Musical Note Estimation for Vocal F0 Trajectories Based on a Semi-Tatum-Synchronous Hierarchical Hidden Semi-Markov Model
This paper presents a statistical method that estimates a sequence of musical notes from a vocal F0 trajectory. Since the onset times and F0s of sung notes are considerably deviated from the discrete tatums and pitches indicated in a musical score, a score model is crucial for improving timefrequency quantization of the F0s. We thus propose a hierarchical hidden semi-Markov model (HHSMM) that c...
متن کاملSpeech-to-Singing Synthesis System: Vocal Conversion from Speaking Voices to Singing Voices by Controlling Acoustic Features Unique to Singing Voices
Introduction: This paper introduces a speech-to-singing synthesis system, called SingBySpeaking, which can synthesize a singing voice, given a speaking voice reading the lyrics of a song and its musical score. The system is based on the speech manipulation system STRAIGHT and is comprised of four models controlling three acoustic parameters: the fundamental frequency (F0), phoneme duration, and...
متن کاملVocal conversion from speaking voice to singing voice using STRAIGHT
A vocal conversion system that can synthesize a singing voice given a speaking voice and a musical score is proposed. It is based on the speech manipulation system STRAIGHT [1], and comprises three models controlling three acoustic features unique to singing voices: the F0, duration, and spectral envelope. Given the musical score and its tempo, the F0 control model generates the F0 contour of t...
متن کاملSinging Voice Synthesis Based on Deep Neural Networks
Singing voice synthesis techniques have been proposed based on a hidden Markov model (HMM). In these approaches, the spectrum, excitation, and duration of singing voices are simultaneously modeled with context-dependent HMMs and waveforms are generated from the HMMs themselves. However, the quality of the synthesized singing voices still has not reached that of natural singing voices. Deep neur...
متن کامل